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Abstract — This technical report is an extended version of 
the paper 'A Receding Horizon Algorithm for Informative 
Path Planning with Temporal Logic Constraints' accepted 
to the 2013 IEEE International Conference on Robotics and 
Automation (ICRA). 

This paper considers the problem of finding the most 
informative path for a sensing robot under temporal logic 
constraints, a richer set of constraints than have previously 
been considered in information gathering. An algorithm for 
informative path planning is presented that leverages tools 
from information theory and formal control synthesis, and is 
proven to give a path that satisfies the given temporal logic 
constraints. The algorithm uses a receding horizon approach 
in order to provide a reactive, on-line solution while mitigating 
computational complexity. Statistics compiled from multiple 
simulation studies indicate that this algorithm performs better 
than a baseline exhaustive search approach. 

I. Introduction 

In this paper we propose an algorithm for controlling a 
mobile sensing robot to collect the most valuable information 
in its environment, while simultaneously carrying out a 
required sequence of actions described by a temporal logic 
(TL) specification. Our algorithm is useful in situations 
where a robot's main objective is to collect information, 
but it must also perform pre-specified actions for the sake 
of safety or reliability. Consider searching for a survivor 
trapped in the rubble of a collapsed building. Our algorithm 
would drive the robot to locate the survivor while avoiding 
obstacles, and returning to a rescue worker to report on the 
progress of its search. The obstacle avoidance and visit to 
the worker are represented as temporal logic constraints. In 
order to locate the survivor, the robot plans a path on-line, in 
a receding horizon fashion, such that it localizes the survivor 
as precisely as possible, while still satisfying the temporal 
logic constraints. 

This work brings together methods from information the- 
ory and formal control synthesis to create new tools for 
robotic information gathering under complex constraints. 
More specifically, the robot uses a recursive Bayesian filter to 
estimate the quantity of interest in its environment (e.g. the 
location of a survivor) from its noisy sensor measurements. 
The Shannon entropy of the Bayesian estimate is used as 
a measure of the robot's uncertainty about the quantity of 
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interest. The robot plans a path to maximally decrease the 
expected entropy of its estimate over a finite time horizon, 
subject to the TL constraints. The path planning is repeated 
at each time step as the Bayesian filter is updated with new 
sensor measurements to give a reactive, receding horizon 
planner. Our algorithm is guaranteed to satisfy the TL 
specification. We compare the performance of our algorithm 
to a non-reactive, exhaustive search method. We show in 
statistics compiled from extensive simulations that our re- 
ceding horizon algorithm gives a lower entropy estimate with 
lower computational complexity than the exhaustive search 
method. 

The algorithm we present is applicable to many scenarios 
in which we want a robot to gather informative data, but 
where safety and reliability are critical. For example, our 
algorithm can be used by a mobile robot deployed on Mars 
that is tasked with collecting soil samples and images while 
gathering enough sunlight to charge its batteries and avoid- 
ing dangerous terrain. In an animal population monitoring 
scenario, our algorithm can drive a robot to count animals 
of a given species whose positions are unknown while 
avoiding sensitive flora and fauna, eventually uploading data 
to scientists. Our algorithm could also be used, for example, 
in active SLAM to control a robot to build a minimum 
uncertainty map [22] of its environment while avoiding walls 
and returning to a base station for charging. 

Extensive work already exists in using information the- 
oretic tools in robotic information gathering applications. 
Most of this work uses a one-step-look-ahead approach [5], 
[18], a receding horizon approach [4], [8], or an offline plan 
based on the sub-modularity property of mutual information 
[16], [20], [21]. The key innovation in our algorithm is 
that it gives a path which is guaranteed to satisfy rich 
temporal logic constraints. Temporal logic constraints can 
specify complex, layered temporal action sequences that 
are considerably more expressive than the static constraints 
considered in previous works. Indeed, much of the work in 
constrained informative path planning can be phrased as a 
special case of the TL constraints that we consider here. For 
example the authors of [4] solve an information-gathering 
problem in which an underwater agent must avoid high traffic 
areas and communicate with researchers — constraints which 
can be naturally expressed as TL statements. 

In this work, we consider a particular kind of tempo- 
ral logic called syntactically co-safe linear temporal logic 
(scLTL) [12]. Synthesis of trajectories from scLTL specifi- 
cations is currently an active area of research [1], [3], as is 
the use of receding horizon control to solve optimization 



problems over TL-constrained systems. Receding horizon 
control (RHC), sometimes referred to as model-predictive 
control, is a control technique in which current information is 
used to predict performance over a finite horizon [14], [17]. 
The authors of [23] use a receding horizon path planning 
algorithm that satisfies TL constraints in a provably correct 
manner and is capable of correcting navigational errors on- 
line. In [7], the authors extended this principle to provide a 
receding horizon algorithm for gathering time-varying, deter- 
ministic rewards in a TL-constrained system. The analysis of 
our informative planning algorithm was inspired by [7], with 
the significant difference that information gain is a stochastic 
quantity which depends on noisy sensor measurements. 

The paper is outlined as follows. We define the necessary 
mathematical preliminaries in Section [H] In Section III 



formalize the scLTL-constrained informative path planning 



problem. In Section IV we present our receding horizon 
algorithm, and prove that it satisfies the scLTL constraints. 
Results from simulations comparing our algorithm to a base- 
line exhaustive search method are presented in Section [V] 
Finally, in Section VI we give our conclusions and discuss 
directions for future work. 

As mentioned in the abstract, this report is an extended 
version of a paper accepted to the 2013 ICRA conference. 
The main additions are some information theory definitions 
in Section and a proof of Theorem [T] in Section IV 



II. Notation and definitions 

For a set S, we use l^l and 2 s to denote its cardinality 
and power set, respectively. S x T is the Cartesian product 
of S and T. 

A. Information theory 

Information theory is a general mathematical theory of 
communication [19]. In this work we borrow from informa- 
tion theory measures of uncertainty and information content 
of discrete random variables. 

We denote the range space of a discrete random variable 
X as Rx, its realization as x £ Rx, and its probability mass 
function (pmf) as px- The Shannon entropy [19] (referred 
to in the sequel as simply "entropy") of a discrete random 
variable X is 



H(X\Y) =H( Px \py) 

= -^2 Px,Y{x,y)\og{p X \ Y {x\y)). ( 2 ) 

V&Ry x£Rx 

In general, H(X\Y) £ [0,H(X)}. H(X\Y) is a measure 
of how random X is if we are given knowledge of Y and 
the statistical relationship between X and Y, 

The mutual information [6] between two random variables 

is 

I(X;Y) =I(p x ;py) 

Ev^ / \ i / Px,v{x,y) (3) 
> Px ,Y(x 7 y)\og( — — ). 
, r( zu Px{x)pY{y) 

The identity I(X;Y) = H(X) - H(X\Y) leads to the 
natural interpretation of mutual information as the increase 
in certainty of X when we have knowledge of Y . 

B. Transition systems and syntactically co-safe LTL 

A weighted transition system [2] is a tuple TS = 
(Q,qo, Act, Trans, AP,L,d), where Q is a set of states, 
qo £ Q is the initial state, Act is a set of actions, Trans C 
Q x Act x Q is a transition relation, AP is a set of atomic 
propositions, L : Q — > 2 AP is a labeling function of states 
to atomic propositions, and d : Trans — > K is a weighting 
function over the set of transitions. 

A finite state automaton (FSA) is a tuple A = 
(E, II, E , F, A A ) where E is a finite set of states, II is an 
input alphabet, So C E is a set of initial states, F C E is 
a set of final (accepting) states, and Aji CExIIxEisa 
deterministic transition relation. 

An accepting run r A of an automaton A on a finite word 



w°w 1 



. w 3 over II is a sequence of states r A 



a°a x . . . a 3+1 such that a j+1 e F and (a 1 , w l , a i+1 ) e A A 

Vie [0,j]. 

The product automaton between a weighted transition 
system TS = (Q,qo, Act, Trans, AP, L,d) and an FSA 
A = (E, II, E , F, A A ) with II = 2 AP is a tuple ? = 
TS x A = (Q x E,<7o x E , Ay, Q x F,d') [2]. The 
transition relation and weighting are defined as Ay = 
{(<7,cr),7r, (q',cr')\(q,TT,q') £ Trans, (a, L(q), a') £ A A } 
and d'((q,a),Tr,(q',a')) — d(q, tt, q'), respectively. 

H(X) = H(p x ) = - ^2 Vx(x) \og{p x {x)). (1) An acce P tin 8 mn r, y on a fi nit e w o rd ^ = 7r ° 7r . 1 - • ^ i s 



x£R x 

The logarithm in ([TJ is base 2 by convention and entropy is 
measured in units of "bits". The entropy of a random variable 
is a measure of its "randomness" or "uncertainty". For a 
fixed range space Rx, H(X) £ [0, \og(\Rx |)]- A uniformly 
distributed random variable achieves the upper bound and a 
deterministic variable achieves the lower bound. 

For two discrete random variables X and Y, the condi- 
tional entropy [6] of X given Y is 



a sequence of states ry = (q°, a°)(q 1 , a 1 ) . . . (q 3+1 , a 3+1 ) 
such that (<7°,cr°) € {<?o} X E , (q 3+1 ,a j+1 ) £ Q x F, and 
((q i ,a i ),TT i ,(q i +\a i + 1 )) £ Ay V» e [0,jj. 

The projection of a run (q°, er°) . . . (q J , a 3 ) from CP to TS 
is the run q° . . . q 3 over TS. 

Syntactically co-safe linear temporal logic formulas are 
made of atomic propositions along with the Boolean opera- 
tors "conjunction" (A), "disjunction" (V) and "negation" (->) 
and the temporal operators "until" ( It ), "next" ( O )> and 
"eventually" ( ) [12]. 



III. Problem formulation 



C. scLTL-constrained informative path planning 



Our task is to find a path such that a robot following 
it fulfills a temporal logic task specification and also on 
average produces a low-entropy estimate of some a priori 
unknown quantity. We model a robot as a deterministic 
transition system over which we can evaluate temporal logic 
specifications and provide a model for incorporating new 
information into the robot's estimate. We use these models to 
formalize the scLTL-constrained informative path planning 
problem. 

A. Robot motion model 

We consider a robot with known kinematic state moving 
deterministically in an environment. Here we have taken 
a hierarchical view of path planning [10], [23] in which 
the problem is decomposed into the high-level problem of 
selecting way points on a graph to be followed by the 
robot and the low-level problem of selecting local trajectories 
between nodes. We assume that the low-level problem is 
solved and focus on high-level path planning. We partition 
the environment and take the quotient to form a transition 
system TS = (Q, q , Act, Trans, AP, L, d) [2], where Q is 
the set of regions in the partition and qo G Q is the region 
where the robot is located initially. Act is a set of finite-time 
control policies lActl] that can be enacted by the 

robot. A transition (qi,Uk,qj) G Trans is a pair of regions 
qi and qj and the control policy Uk that can be applied to 
drive the robot from ^ to qj. AP is a set of properties that 
can be assigned to regions in Q and L : Q — > 2 AP is the 
mapping giving the set of properties satisfied at each region. 
d : Trans — > K is a weighting over the transitions whose 
value corresponds to the cost of enacting the given control. 
We define the discretized time k that is initialized to and 
incremented by 1 after a transition. We denote the state of 
TS at a time k as q k . 

B. Estimator and sensor dynamics 

The robot is tasked with estimating an environmental 
feature modeled as the random variable S. We assume that 
the robot has onboard sensors and can take and process 
measurements related to S. We encapsulate the measurement 
and data-processing performed during a transition on TS at 
time k as a report y k . The report is drawn from a random 
process Y k whose randomness encapsulates sensor noise. 
The pmf of Y k depends on the realization s of S, the position 
of the robot, and sensor statistics. We can use this model 
to construct a likelihood function f(y k ,s,q k ) — Pr[Y k — 
y k \S — s, robot at q k ]. The robot maintains an estimate 
pmf p : R s x N -> [0,1], where p(s,l) = Pr[S = 
s|{y J = y J }j£\o,i]]' After a transition is completed, the robot 
incorporates the report into p via a Bayes filter 



p(s,/ + l) = 



f{y l +\s,q l + l )P{s,l) 

E. £ ii S /(y i+ V,<? m )i5(M)' 



(4) 



Our task is to select a sequence of transitions 
{{q l , u l , <7 i+1 )}ie[o,fc-i] sucn that the induced run q° .. .q k 
over TS on average produces the best estimate of S. The 
robot's knowledge of S is given by its estimate p{-,k). We 
quantify the impact on p(-, k) of a set of transitions by using 
the mutual information I(p(-, k); {Y l }ie[o.k]), a frequently- 
used measure of sensing quality in sensor networks, lo- 
calization, and surveillance problems [5], [11], [15], [18]. 
Since our goal is to produce the best estimate, we naturally 
wish to maximize the mutual information. We may restate 
this objective by using the identity I(p(-, k); {Y^}je[o,k]) = 
H(p(; k))-H(p(; k)\{Yi} Mm ). The estimate p(-, •) does 
not change over time if no new reports are received, so max- 
imizing the mutual information is equivalent to minimizing 
the conditional entropy H(p(-, k)\{Y 3 }j & m } f.])- 

Problem 1: The scLTL-constrained informative path plan- 
ning problem over TS is the optimization 

min E {Yj} [H{P{;k)\{Yi})] 

subject to (5) 

(q\ u\ q l+1 ) G Trans Vi G [0, k - 1], 

where 4> is an scLTL formula over AP, the likelihood 
function / and initial pmf p(-,0) are given, and k is finite 
but not fixed. 

We discuss how to use model checking and optimization 
tools to solve this problem in the next section. 

IV. Receding horizon informative path planning 

From an scLTL formula <fi, we can construct an FSA A^ 
that will accept only those words that satisfy (j> [12], [13]. 
Given TS and 4>, we can construct a product FSA 7 — 
TS x A,/,. Accepting runs over 7 are given as finite words 
(q°, cr°) . . . (q k , a k ) such that transitions between subsequent 
states are in Ay and a G F. Problem can be solved using 
the following procedure: 

Algorithm 1 (Exhaustive Search): 

1) From TS and 0, construct 7 = TS x A$ 

2) Enumerate all accepting runs of 7, i.e. all simple paths 
from (q°,a°) to states in Q x F 

3) Project all accepting runs on 7 to runs over TS 

4) Calculate E {YJhe[0k] [H{p{-,k)\{Y^})} for each ac- 
cepting run. 

5) Select the trajectory with the minimum expected con- 
ditional entropy 

The calculation of E{Yi} je[0 k] W(p(-, k)\{Y j })] from a 
given run proceeds as follows. A run over TS q° . . .q k 
induces a sequence of reports y°, . . . ,y k . We can find the 
estimate using Q that would result from observing a given 
sequence of reports and calculate H(p(-, fc)|{y : '} :/ G[o,fc])- 
We can use the given run, the prior estimate pmf 
p(-,0) and the likelihood function / to construct a pmf 
PY°....,Y k (y°7 ■ ■ ■ ,y k )- Taking these together we can calcu- 
late 



E{Y i}mo JH(p(;k)\{Yi})} 
J2 H(p(.,k)\y ,..., 

y ,...,y k €R^ 

pyo,...,y"(v°, ■■■,y k ) 



(6) 



A. Receding Horizon Control 

The exhaustive search (Algorithm [TJ produces a solution 
that is optimal in expectation. However, it is computationally 



expensive (see Section IV-A.2 1. Algorithms exist to mitigate 
the computational costs incurred by Algorithm [T] [16], [21], 

Algorithm [T] gives a non-reactive trajectory computed 
before the robot collects any additional information about 
S. The optimal path is calculated based on the topology of 
CP, the sensor noise, and some initial guess p(-,0). What if 
a sample path of Y k is atypical or p(-,0) is a bad guess? 
After making I < k transitions, we cannot guarantee that 

arg min E {Y]} [H(p(;k)\{Y^},{y m } mem) )} 

{<3 3 }je[i,fc] 

is the same as the end of the trajectory calculated using 
Algorithm [T] In the next section, we propose an on-line 
receding horizon algorithm that addresses the issues of 
computational explosion and non-reactivity. 

1) Algorithm description: In the RHC approach to Prob- 
lem [T] we select some horizon b and at each time / solve the 
following problem 



min E {Yi} [H(p(.,k)\{Yl})]) 
subject to 



(7a) 



(7b) 



W( X l a t b ) < WUllT 1 ), if N r , Xf (X l ,b) ± X f (7c) 



1+1 ^ {X ] } 3 eli r .i h if X 1 G N r ( X f,b), 



Xopt 



(7d) 



where x l = (Vi "') is the state of CP at time I, x pt 
is a state in the optimal finite-horizon trajectory calculated 
at time I, Xpred is a state in the optimal finite-horizon 
trajectory previously calculated at time I — 1, N r (x,n) is 
the neighborhood of states about x that are reachable in n 
or fewer transitions, and l r is the minimum value of I such 
that x l G N r (xf>b). The optimization (7J is solved in the 
same manner as Algorithm [T] in which feasible paths on CP 
over the short horizon are enumerated, projected back to TS, 
and their expected impact on conditional entropy evaluated. 
The function W : Q x £ — > K is defined as 

Xo = (qo,cr°) 

X f = arg max D(xo,Xk) s - t. D(xo,Xf) < °° (R\ 
W( Xj )=D(Xj,Xf), 

where D(-,-) is the shortest graph distance between two 
states in CP. The distance between two adjacent states is given 
by the weighting d'. N r , Xf (x, n) is the constrained n-step 
reachability neighborhood 



""V*'"* \ Xf Xf&N r ( X ,n) ■ W 

The extra conditions |7b]i-(|7d| ensure convergence to Xf 
in finite time. Constraint ( |7b| ) ensures that if Xf is reachable 
from the current position, the terminal state in the finite- 
horizon trajectory is Xf- Constraint ( |7c] > is similar to a 
decreasing energy constraint used in Lyapunov convergence 
analysis. It ensures that the finite-horizon trajectory moves 
closer to an accepting trajectory as time increases. Condition 
( |7dl > ensures that CP does not cycle infinitely between non- 
accepting states. 

We construct a receding horizon algorithm adapted from 
[7] to solve Problem [T] 

Algorithm 2 (Receding Horizon): 

1 = 

X=(qo,<r°) 
While ( X + Xf) 

{X^ed}me[i,/+6-l] = {X^t}m6[2,i+6-l] 

{X^t}me[l+i,l+b] = solution to 
X Xopt 

1 + + 

If at least one satisfying run exists on CP (i.e. if W(xo) is 
finite), then any path produced by Algorithm [2] satisfies the 
specification (f>. This is formalized in the following theorem. 
The proof proceeds in a similar manner as in [7]. 

Theorem 1 (scLTL satisfaction): If W(xo) < °°> apply- 
ing Algorithm [2] to Problem [T] will result in an accepting run 
on CP. 

Proof: 

This proof uses the following properties of W(-) 

1) W( Xj ) =0^ Xj =Xf 

2) W(xj) = oo 43- Xf not reachable from Xj 

3) W( Xj ) < oo => 3 Xk G N r ( Xj , 1) such that W(xk) < 

First, we must prove that if W(xo) < ffjft h as a 
solution for every time I. If x l is such that N r>x . (x , b) = Xf> 
then the condition W(xo) < oo and ( f7c] > together imply 
W(x l ) < oo. Thus, by definition of N r , Xf (x\b) and (Tb) , 
there exists a trajectory of b or fewer steps from x l to Xf 
that does not include any previously selected states. If, on 
the other hand, N r , Xf (x , b) — N r (x , b), consider the 6—1 
step trajectory given by T fc _i = x'x^d ■ ■ • X^JT 1 - Since this 
trajectory is in 7Y r (x , b) it must also be in N r (x , b — 1). 
By the third property of W(-), there is a neighbor Xk of 
xltt 1 such that W( X k) < ^(Xp+ed' 1 )- Define a 6-step 
trajectory 7], = T^-iXk- definitely exists and satisfies 

(pEJ-Q. 

Given that a solution exists for every time /, we must 
prove that application of Algorithm [2] will drive CP to the 
state Xf m finite time. Define the function a(l) — W(x l o^t )- 
The constraint (Tcf i forces a to be monotonically decreasing 
and by definition of W, a has a global minimum value of 
0. CP has a finite number of states, so there exists some finite 
time I* such that a(l*) = 0. After I*, all future states are 
constrained to N r (xf,b). 



Once CP is in N r (xf,b), ( f7c] > is insufficient to guarantee 
convergence to \f- However, imposing the non-repetition 
constraint |7d} forces CP to converge to Xf m at most 
\N r (xf,b)\ steps after entering N r (xf,b). 

m 

Note that intuitively in an environment with spatially 
distributed information we expect that longer paths generally 
will be more informative. It may seem that using conditions 
(|7b"|>-(|7d|i to ensure convergence causes Algorithm [2] to con- 
verge more quickly (produce shorter paths) than is desirable. 
This effect is offset by the reactivity of the algorithm and 
selection of the optimal local trajectories at each time step. 
Our approach can be adapted to further address path length 
concerns by including minimum path length constraints in 
Algorithm [2] or by specifying in the scLTL constraint <p a set 
of spatially dispersed regions that the robot must visit. 

2) Computational complexity: Define K(xo>Xfit) as the 
number of simple paths of length less than or equal to t that 
connect xo to Xf m 3 s - Let t* be the length of the longest sim- 
ple path in CP and let k(7, t) = max K(xo,Xf^)- 

Xo,X/6(QxE) 2 

Calculating the expected impact of each transition on the 
conditional entropy of the estimate requires \Rs\ calcu- 
lations. The computational complexity of Algorithm [T] is 
therefore 0(|i?g|t*/t(CP, t*)). 

For Algorithm [2] consider a single solution of ([Taj with 
short horizon b. The number of possible paths is bounded by 
£k(T, b), where £ is the number of edges of the maximally 
connected state in CP. The complexity of a single solution to 
{7} is 0(\R s \b£n(y,b)). Constraints ((7bJ»-([7d) mean that 
is solved at most N = \Q x S| times. The complexity of 
Algorithm [2] is 0{\R s \b£N n{V , b)). 

A comparison of worst-case complexity depends on the 
size and topology of CP. Note that for a product automaton of 
large size and high enough connectivity, the function k(CP, •) 
increases exponentially in path length. For such systems, 
Algorithm [2] has the lowest worst-case complexity. While 
it may seem disingenuous to compare the complexity of an 
off-line algorithm against an on-line algorithm, note that as 
the size and connectivity of CP grow, it becomes infeasible 
to solve Algorithm [T] in a reasonable amount of pre-mission 
time before it becomes infeasible to calculate Algorithm [2] 
on-line. 

V. Simulation Study 

We performed a simulation study demonstrating the use 
of Algorithm [2] to solve Problem [T] We assumed the tran- 
sition system is the quotient of a gridded partition and that 
all neighboring regions are deterministically reachable. Our 
variable of interest is S — [Sj]j-. qj eQ where i?s^ = {0, 1}. 
We assume that the Sj are mutually independent. After 
a transition to a new region, the robot returns a report 
y k G {0, 1}. Our estimate is formed using a prior pmf and 
the Bayesian filter Q. 

We assume that the volume of a region q G Q is 
sufficiently small compared to the volume observable by the 
robot's sensors such that the robot will receive information 
from adjacent regions. We model this overlap by a set -E^eas, 



where the existence of an element e,-fe € E meas indicates 
information from region qj can be gathered while the robot 



is in q k . We assume here that E„ 



{ejk\(qj,u,q k ) G 



Trans}, though this assumption does not need to hold 
in general. Each element of -Emeas is weighted according 
to the distance dM(<lj,qk) that represents the amount of 
information contained in region q k that can be observed 
from region qj. Define the observation neighborhood of qk 
as N Q (q k ) = {q l \e ik G -EmeaJ U q k . We assume independent 
correct report rates u(q k , qj) = Pr(Y fc = 1| agent at q k , Sj — 
1) and a constant false alarm rate r such that the overall alert 
likelihood is 

f r if Sj = Vj : Qj G N (q k ) 
f(l,s,q k ) = l i_ JT l-^ q k )q .) Sj , else (10) 
I ? J eA r o(g fc ) 

Since our reports are binary, we can calculate /(0, s, q k ) 
from /(l, s, q k ). Our detection model is given by 



u(q k , q] ) = n Q e- XdMiqk ' q i ) 



(11) 



The propositions in our scenario are AP = 
{D1,D2,C,U}. The specification we wish to satisfy 
is "Visit Dl before visiting D2 and visit D2 before ending 
in C while avoiding U". The task is formalized as 



min E {Yi} [H(p{;k)\{Yi})) 



{q j }j£{o,k] 
(-U U C) A 



subject to 

-iC U D2) A (^D2 U Dl) 



(12) 



We generated a 5 x 5 grid-like abstraction with fixed 
initial state and terminal 'C state. Our simulation was 
constructed using NetworkX graph algorithms [9] and the 
model-checking algorithms of scheck [13]. We performed 
100 Monte Carlo trials with randomly placed 'Dl', 'D2', and 
'U' labels. The sensing parameters were u = 0.9, r = 0.01, 
and A = 0.01. Weightings dm between adjacent states were 
drawn according to uniform distributions over (0, 10), graph 
distances between two adjacent states were set at a value 
of 1, and the Sj were generated according to a Bernoulli 
distribution with parameter p = 0.08. Figure [T] shows sample 
paths resulting from using Algorithm [2] to solve ( [121 ). Each 
run satisfied the given constraint specification. The average 
terminal entropy H(p(-,k)\y°, . . . ,y k ) over all trials was 
14.78 bits. The average CPU time required for construction 
of mathcalP and optimal path finding per trial was 2.61 s 
on a machine with a 2.66 GHz Intel Core 2 Duo processor 
and 4 GB of memory. 

In order to compare the performance of Algorithms [T] and 
[2] we solved (JT2]» over a single 6x6 transition system 
generated in the same manner as above. We chose the larger 
system in order to make comparisons over a larger number 
of accepting runs. We used Algorithm [T] to find the optimal 
path in the constrained environment and constructed the 
pmf of the terminal entropy. We also performed 250 Monte 
Carlo trials using Algorithm [2] over the same environment 
and constructed the empirical pmf of the resulting terminal 



(a) (b) (c) 

Fig. 1. Three sample paths produced by our receding horizon algorithm to solve the scLTL-constrained informative path planning problem. The red 
transitions indicate the path followed by the robot with arrows indicating direction. The specification is "Beginning at green, visit a light b lue region, and 
then visit a dark blue region, and finally visit the orange region while always avoiding red regions". This corresponds to the formula in (12) where the 
orange state is 'C, red states are 'U', light blue are 'Dl', and dark blue are 'D2'. 



entropies. Histogram representations of the two pmfs are 
shown in Figure [2] The mean, median, and variance are 
26.30 bits, 26.44 bits, and 3.67 bits 2 , respectively, for the 
pmf from Algorithm [Jj and 25.86 bits, 26.44 bits, and 2.80 
bits 2 , respectively, for the empirical pmf from Algorithm 
[2] These results confirm our intuition about reactivity and 
performance: Algorithm [2] performs better in expectation and 
has lower performance variability than Algorithm [T] 

VI. Conclusions and Future Work 

In this paper, we considered planning an informative path 
for a robotic agent subject to temporal logic specifications. 
We modeled the robot as moving deterministic ally on a graph 
with noisy sensor measurements at each node. We proposed 
a receding horizon algorithm for solving this problem in an 
on-line, computationally efficient manner while still ensuring 
specification satisfaction. We compared the performance of 
our algorithm with an off-line exhaustive search method in a 
simulation study. Our algorithm out performed the exhaustive 
search method, producing lower entropy estimates with less 
computational overhead. 

One natural extension to this work is to plan a path that 
optimizes some other quantity (e.g. path length or graph 
distance) subject to a minimum level of mutual information. 
That is, we make the information content of the path a 
constraint rather than an objective. Another possible ex- 
tension is to consider cases in which the satisfaction of 
the temporal logic specification relies on some unknown 
quantity. Consider, for instance, a rescue robotics scenario 
in which the robot is tasked not only with finding survivors, 
but also moving the survivors to a medical station. In this 
case, planning a path to the medical station is impossible 
until the robot knows the survivor's location. This extension 
would allow us to use formal synthesis methods in a more 
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Fig. 2. Histograms of (a) the pmf of the terminal entropy found 
when following the path from Algorithm [T] and (b) the empirical pmf of 
the terminal entropy that resulted when the paths were calculated using 
Algorithm[2] These histograms show that the mean and variance of the pmf 
of the terminal entropy is lower for the paths generated by Algorithm [2] 
than the for the path generated by Algorithm [T] The lower mean indicates 
that using Algorithm [2] will result in a lower entropy estimate on average. 
The lower variability means that we are less likely to have a high entropy 
estimate when using Algorithm |2] Algorithm [T] took 1741 s of CPU time 
to complete and Algorithm [2] took an average of 2.94 s of CPU time per 
execution to complete. 



reactive manner. More generally, we expect that the fusion of 
information theoretic tools with formal control synthesis will 
yield robotic control policies that are reactive to noisy, real- 
world environments while still providing provably correct 
performance. 
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